Based on the locational uncertainty EDA, we have decided to use a dataset that merges the CORINE and land use datasets using the coarser CORINE dataset where the Vaud land use dataset is unavailable. The spatial detail of the OpFo dataset (created for the structured sampling design based on detailed maps from the Swiss government) resulted in a lot of uncertainty in sample categorization, even assuming best GPS recording practices by the collectors. The other two datasets were much less ambiguous, with 10m buffers largely falling into a single category. Thus, we will try to use the greater thematic accuracy of the land use dataset where possible, with the CORINE dataset as essentially the baseline.
This script first explores the union of the two datasets, where the detailed land use category is assigned if available, and the CORINE category if it is not. For analyses, the joint legend will be simplified to a smaller number of categories.
The union of CORINE and the Vaud land use layers was performed in QGIS 3.4.11.
Using the method described in the coordinate uncertainty EDA, points with clearly unreliable coordinates are removed. We assume that the remaining coordinates were collected under good GPS conditions, with the location recorded where the ants were collected. We use a 5m buffer to account for the limitations of GPS coordinates collected by smartphones.
geo_exclude <- c("extrapolé mauvais",
"extrapolé (base tube précédent)",
"extrapolé (église par défaut)",
"extrapolé (gare par défaut)",
"extrapolé/corrigé (église par défaut)")
dec_thresh <- 4 # remove lat/lon with 0-3 decimals
pub_filt <- ant$pub %>%
filter(!is.na(GEOPRECISION)) %>%
filter(!GEOPRECISION %in% geo_exclude) %>%
filter(is.na(LATITUDE) | nchar(LATITUDE) >= (dec_thresh+3)) %>%
filter(is.na(LONGITUDE) | nchar(LONGITUDE) >= (dec_thresh+2))
pub.5m <- pub_filt %>% st_buffer(dist=5)
There are 153 categories in Vaud after overlaying the Vaud land use dataset with the CORINE dataset, prioritizing the Vaud land use dataset. First, we can just look at the distribution within Vaud for categories comprising at least 0.5% of the area.
Or scroll through all categories, sorted by percent coverage:
The 5m-buffered points need to be intersected with the landcover dataset. The categorization is assigned as the most abundant category within the 5m buffer, following methods in the locational precision EDA.
lc_VD <- st_make_valid(lc_VD)
full_obs.df <- intersect_with_composition(pub.5m, lc_VD, "cat_full")
st_write(full_obs.df, paste0(gis_dir, "opfo_corine_use_21781.shp"))
full_obs.df <- st_read(paste0(gis_dir, "opfo_corine_use_21781.shp")) %>%
categorize_buffers(., "cat_full", "full") %>%
mutate(across(starts_with("cat_"), as.character),
Dataset=if_else(grepl(":", cat_full), "CORINE", "VD_LU"),
code_full=paste0(str_sub(Dataset, 1, 2), "_", str_sub(cat_full, 1, 3)))
## Reading layer `opfo_corine_use_21781' from data source `/Users/tsz/Documents/unil/opfo_main/2_gis/data/VD_21781/opfo_corine_use_21781.shp' using driver `ESRI Shapefile'
## Simple feature collection with 7921 features and 4 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: 496366 ymin: 116314 xmax: 585184 ymax: 202959
## CRS: 21781
In all of Vaud, there are 153 categories. Using the 5m buffers, there are 61 categories.
With the joint GIS layer, there is a bit more uncertainty compared to each dataset separately, but that is almost inevitable, since the large CORINE polygons are broken up by the smaller land use polygons, and the land use polygons now have CORINE polygons where before there was no data. Still, it is better than the OpFo dataset, with 85% of buffers capturing only a single land use / land cover type.
As hinted in the above plot, a large majority of collections (86.9%) fall into the CORINE polygons. Many of these are in urban (35.5%) or forested (25.3%) areas.
There are a few CORINE categories that are somewhat vague catch-alls. Looking at a satellite overlay in QGIS, the largest patches of 211: Non-irrigated arable land, 243: Land principally occupied by agriculture, with significant areas of natural vegetation, and 321: Natural grasslands are pastures, though not exclusively. In some areas, they include smaller patches of buildings or forest. This is inevitable with CORINE, which has a minimum mapping unit of 25ha. The documentation also only states a thematic accuracy of ≥85%, without additional detail.
Output categories to upload to google drive, where they will be aggregated.
lc_VD %>% st_set_geometry(NULL) %>%
group_by(Dataset, cat_full) %>% summarise(tot_area=as.numeric(sum(area))) %>%
ungroup %>%
mutate(prop_area=tot_area/sum(tot_area)) %>%
mutate(in_collections=cat_full %in% full_obs_props$cat_full,
prop_collections=full_obs_props$prop_area[match(cat_full,full_obs_props$cat_full)],
code_full=paste0(str_sub(Dataset, 1, 2), "_",
str_sub(cat_full, 1, 3))) %>%
arrange(desc(in_collections), Dataset, cat_full) %>%
write_csv("temp/corine_open_category_lookup.csv")
For several categories, it is not obvious (to me) how they should be re-categorized. These are shown here, with examples of satellite imagery. Some are only a few tubes.
Urban?
Agriculture (tilled?)? Mostly space between VD parcels.
Urban? Mostly space between VD parcels, but nearer to buildings than 211.
Pasture?
Mostly high elevation grasslands which are grazed
Transition? Pasture?
Transition? Pasture?
Pasture? Wetland?
Agricultural tilled?
Urban? Transition? Agriculture?
Forest?
Agriculture tilled? But maybe in a greenhouse…?